Comparative Analysis of Preference in Contemporary and Earlier Texts Using Entropy Measures

Research in computational textual aesthetics has shown that there are textual correlates of preference in prose texts. The present study investigates whether textual correlates of preference vary across different time periods (contemporary texts versus texts from the 19th and early 20th centuries). Preference is operationalized in different ways for the two periods, in terms of canonization for the earlier texts, and through sales figures for the contemporary texts. As potential textual correlates of preference, we measure degrees of (un)predictability in the distributions of two types of low-level observables, parts of speech and sentence length. Specifically, we calculate two entropy measures, Shannon Entropy as a global measure of unpredictability, and Approximate Entropy as a local measure of surprise (unpredictability in a specific context). Preferred texts from both periods (contemporary bestsellers and canonical earlier texts) are characterized by higher degrees of unpredictability. However, unlike canonicity in the earlier texts, sales figures in contemporary texts are reflected in global (text-level) distributions only (as measured with Shannon Entropy), while surprise in local distributions (as measured with Approximate Entropy) does not have an additional discriminating effect. Our findings thus suggest that there are both time-invariant correlates of preference, and period-specific correlates.


Introduction
What makes a text "successful", in the sense that it sells well, reaches a broad readership and/or acquires prestige among educated readers and critics? Is it promotion, network effects, economic or social circumstances-or perhaps the "quality" of the text itself? These questions have recently been addressed in a variety of studies in the field of computational aesthetics, aiming to identify observable correlates of preference in the structure of a text [1][2][3][4][5][6][7]. In empirical aesthetics the term "preference" is used to capture aesthetic attitudes towards cultural artefacts [8]. Such attitudes can be held both at an individual level-specific readers enjoy specific (types of) books-and at a community level-specific types of texts, and their authors, may obtain recognition and acquire prestige [9][10][11].
On the assumption that aesthetic experience can have a foundation in the cultural artifact itself, a natural question to ask is whether, or to what extent, correlations between properties of a work of art, such as a literary text, and the aesthetic response in readers, are invariant across time, space, and cultural environments, or whether they are dependent on such variables. In the present study we address this question by studying correlations between structural properties of texts and degrees of (community-level) preference across two time periods. Specifically, the central question is to what extent the textual determinants of preference in the 19th and early 20th centuries were the same as, or different from, the tation and surprise [7,30,31] (moreover, see [32,33] for a discussion on "unification" and "diversification" in text).
Specifically, the present study investigates the differences and similarities in the relationship between the degree of surprise in the textual structure and (community-level) preference, in two time periods, the 19th and early 20th centuries, and the contemporary period. Preference is operationalized as canonization for the earlier texts, and in terms of sales figures for the contemporary texts. Similarities can be expected on the assumption that certain determinants of preference are time-invariant (universal), and do not vary significantly with the readership. Differences can be expected because writing styles are known to vary from one period to the next [25], and because literature is embedded into socio-cultural contexts, with changing aesthetic preferences in all domains of culture (music, painting, architecture, etc.). Moreover, the two operationalizations of preference can be expected to have different types of reflexes.
As reviewed in more detail in [7], preference has been operationalized in terms of the scope of the readership in previous studies under different terms, such as "success" [27], "popularity" [34], being "professional" vs. "amateur" [1], or "information-based energy" [35]. Data from websites and social networks have been used in some previous studies to model readers' preference, for instance, the download counts from the website of the Gutenberg Project [2], or ratings of readers on the website Goodreads [3,5].
Some previous studies have referred to the Nobel Prize as a gauge for high quality or success. For example, Febres and Jaffe [4] analysed the categories of Nobel laureates and non-Nobel laureates in two languages, English and Spanish, using global properties, such as entropy, lexical diversity and word frequency distribution in texts. Their results showed that statistical measures can be predictive of the category of texts, with a higher performance for Spanish compared to English texts. Bizzoni et al. [6] classified Nobel prize winners from other texts using the fractality of sentiment arcs. They showed that the distribution of self-similarity measures in the two text categories under analysis differed, and that the degree of fractality of higher-quality texts is likely to be located in a specific range of values.
Mohseni et al. [15] approached the discrimination of canonical from non-canonical texts using textual properties of texts represented in the form of a series. They used sentence length, the frequencies of POS tags per sentence, the lexical diversity metric MTLD, and topic probabilities to numerically represent the structure of a text, and determined the variance and long-range correlations of the series corresponding to the texts. Training a classifier with the calculated values, they were able to distinguish fictional from nonfictional and, within the fictional category, canonical vs. non-canonical English texts with acceptable accuracy.
Success has also been defined based on sales figures. Yucesoy et al. [17] and Wang et al. [18] analysed texts in the New York Times Bestseller lists and Vasyliuk et al. [19] investigated bestseller books on Amazon. However, they did not analyse the texts of books, but rather restricted their analyses to more straightforward statistical information and metadata, such as the time of publication, number of reviews, genre, and price, and related the success of the texts to non-textual factors.
In the present study we adopt the approach to textual aesthetics proposed by Mohseni et al. [7,15]. We assume that a pleasant reading experience emerges from an interplay of predictability and surprise. Previous work has shown that canonical literature differs from non-canonical literature in its degree of predictability. Mohseni et al. [7] analysed two types of series derived from texts, sequences of sentence lengths and of frequencies of part-of-speech (POS) tags in fixed-size windows of text (see Section 2.3). Two entropic metrics were computed, Shannon Entropy (ShEn) and Approximate Entropy (ApEn), for the distributions of relevant text properties. ShEn measures (ir)regularity as a global structural property. ApEn determines (un)predictability as a sequential characteristic of underlying text property series (see Section 2.4). This method was also applied in the present study, with a different dataset. Note that the present study primarily focuses on the classification of texts on the basis of preference levels. The temporal dimension comes into play insofar as we compare texts of preference levels from two periods of time. We do not perform temporal classification, in the sense that the time of writing is the category of classification. Approaches to temporal classification are nevertheless summarized in the Supplementary Materials, Section S1.
The paper is organized into three sections. Section 2 contains a description of the data and methods. Section 3 presents the results, which are discussed in Section 4.

The Jena Corpus of Expository and Fictional Prose
The present study is based on the Jena Corpus of Expository and Fictional Prose (JEFP), version 2.0. The corpus was compiled for a comparison of different types of fictional and non-fictional texts from the 19th and early 20th centuries, here called "earlier" texts, and it has been used for the study of questions relating to empirical aesthetics [7,15]. The JEFP Corpus comprises three sub-corpora: canonical/fictional, non-canonical/fictional and non-fictional ( Table 1). The canonical sub-corpus consists of 76 texts that form part of the Western literature canon Bloom [13]. It represents a collection of fictional texts that are widely known among the educated population, often taught in school and discussed in academic discourse. The category of non-canonical fictional texts comprises 130 texts that were obtained from the Project Gutenberg website. It represents the non-preferred earlier texts. Finally, the sub-corpus of non-fictional texts contains 185 texts from different genres such as architecture, astronomy, geology, geography, philosophy, psychology and sociology. These texts were also obtained from the Project Gutenberg website. For our comparative study, we also needed a corpus of contemporary texts to compare them with the earlier texts in the JEFP corpus. Thus, we compiled a corpus which contained categories analogous to those of the JEFP corpus (preferred/fictional, nonpreferred/fictional and non-fictional). We called this corpus the "Jena Corpus of Contemporary Expository and Fictional Prose" (JCEFP).
To compile the list of preferred contemporary texts, we used the New York Times Bestseller list, which is published weekly in the New York Times Book Review. Some books manage to appear on the list for several weeks, and some lose their rank after only one week in competition with other books. We selected ninety-three texts from lists of the New York Times Fiction Best Sellers published from 2000 to 2020. Our selection was based on lists taken from Wikipedia for each year.
To build the category of non-preferred contemporary texts, we used the website www.smashwords.com (accessed on 11 March 2021), which allowed us to search for texts based on various criteria, such as genre, length and price. In this part of the corpus, we only included freely available fictional texts, assuming that texts promising commercial success will not be distributed for free by a publisher. This part of the corpus consequently contains no bestsellers, as bestsellers would have to be bought. For a book to be free does not of course mean that the book is not read by anyone. In fact, free distribution could be an incentive for people interested in popular literature to read the texts. Moreover, if an author manages to publish a successful text later, their previous, less-successful texts may find more readers (as in the case of B. Obama's first book Dreams from my Father, for instance). Still, at the time of publication the texts are clearly non-bestsellers, and books that are not promoted by publishers. The non-preferred sub-corpus thus compiled by us contained 110 texts.
Non-fictional texts were randomly selected from different genres, e.g., philosophy, psychology, sociology and natural science, similar to the genres that we included for texts in the JEFP corpus. The contemporary version of the non-fictional sub-corpus contained 122 texts. Table 2 presents the summary statistics for the JCEFP corpus. As we selected bestselling books from lists from 2000 to 2020, the category of non-preferred non-fictional texts was also restricted to texts that were published after 2000. Table S1 in the Supplementary Materials lists all texts with the metadata. All texts in both the JEFP corpus and the JCEFP corpus were pre-processed in the same way. We removed the tables of contents and indices and cleaned up the texts partly manually and partly automatically using regular expressions to fix broken lines and hyphenated words.
To segment texts into sentences and to assign POS tags to tokens, we used the Stanza package for Python [36], a neural-based text processing toolbox with high accuracy. We used the toolbox with the default pre-trained model for English (UD English EWT, version 1.0.0 [37]).
Note that previous studies have shown no underperformance of taggers for texts from the 19th century. This is probably due to the fact that orthography was already standardized at that time. For instance, Schneider et al. [38] showed that if a POS tagger was trained on contemporary texts and applied to historical texts written after 1800, the performance would not drop. They also analysed the tagging errors and showed that most POS tagging mistakes were found in lower-level categories within the major classes; for example, between NN (noun, singular or mass) and NNP (proper noun, singular), and between VB (verb, base form) and VBP (verb, non-third person singular present). Such errors would not affect our results because we analysed the distribution of major word categories (see Section 2.3).

Properties Underlying Textual Structure
To analyse the structural organization of texts, we took the same approach as Mohseni et al. [7]. We represented and analysed texts by seven text properties: sentence length and the frequencies of six major parts of speech in fixed-size windows: Noun, Verb, Adjective, Adverb, Pronoun and Preposition. Sentence length was measured as the number of tokens in a sentence, including all words and punctuation marks. Each major part-of-speech (POS) included all relevant sub-categories. For example, plural, singular, common and proper nouns all were counted as Noun. All forms of verbs, base form, past tense, past participle and gerund, were treated similarly as Verb. Adjective and Adverb included simple, comparative and superlative types. Pronoun covered personal and possessive pronouns.
To build series of part-of-speech (POS) tags, we counted the number of each POS tag in subsequent windows of 25 tokens of text. As mentioned in Mohseni et al. [7], the window size does not have a significant effect on the results as long as it is within reasonable limits.
By windowing, we split each text into a sequence of fixed-length segments. Fixed-length segmentation eliminates undesirable effects of correlation between sentence length and frequencies of POS tags. Each window of text is called a "box". Each box is like a small bag of words, in which the internal structure of the texts is ignored and only the frequency of POS tags is determined. We therefore call this approach a 'sequence of boxes' approach. If the order of the boxes in the sequence was taken into account, we analysed the underlying structural design of a text (as in the case of Approximate Entropy; Section 2.4). If we ignored the linear order of the boxes, we analysed the global distribution of POS tags in a text (as in the case of Shannon sntropy; Section 2.4).

Approximate Entropy and Shannon Entropy
To measure the degrees of (ir)regularity and (up)predictability in a series of text properties (Section 2.3) we used two entropy measures: Shannon Entropy (ShEn) and Approximate Entropy (ApEn) [39]. ShEn is a measure of global distribution and is computed as where S x is the set of all possible events x. ShEn assumes that events happen independent of each other. This metric measures the degree of uncertainty. If the probability of all events is equal, the system has the highest uncertainty, and as a result, ShEn takes its maximum value.
Conversely, ApEn is a measure of sequential organization (cf. Supplementary Materials, Section S2). It was proposed to measure the degree of (ir)regularity in a series according to the distance (dissimilarity) of sub-sequences to each other. As variation is an intrinsic characteristics of a series, in ApEn some level of fluctuation is "tolerated". If the difference between two sub-sequences lies within the "tolerance" level, it is assumed that "similarity" is not violated. In the computation of ApEn, the sub-sequence matches of length m are compared with sub-sequence matches of length m + 1. In a sequence with a high level of fluctuation, longer sub-sequences are less-likely to be similar to each other, which in turn leads to a higher ApEn value. In exploratory studies, the parameters of ApEn, i.e., m and r, are usually set to 2 and 20% of the standard deviation, respectively, (see, for example, [7,[40][41][42]).
In our experiments we used both ShEn and ApEn. ShEn measures surprise based on global distributions. AppEn measures surprise based on (ir)regularities in the series. Note that a high degree of AppEn implies a high degree of ShEn but not vice versa. We first calculated the degree of irregularity (or unpredictability) in a series of text properties. On this basis we determined to what extent any observed difference originate from the global distribution of the features (ShEn), or from their sequential organization (ApEn). The code that we used to calculate features is accessible at https://github.com/mohsenim/Surprise (accessed on 5 February 2023).

Results
Our analyses implied a two-dimensional comparison. We carried out (i) a comparison of preferred and non-preferred fictional texts, for each period, and (ii) a comparison of the differences for each period. We used our two corpora, JEFP and JCEFP, which, as explained in Section 2.1, contained preferred texts (canonical texts in JEFP; bestselling contemporary texts in JCEFP), and non-preferred texts (non-canonical texts in JEFP; non-bestselling contemporary texts in JCEFP). In the following subsections, we start by presenting the results of the statistical analyses (Section 3.1) before turning to the results from classification (Section 3.2).

Statistical Analysis of Features
For the category of earlier texts we used the data published in Mohseni et al. [7], where the texts of the JEFP corpus were analysed. For contemporary texts we created a series of seven observables for each text in the JCEFP corpus, following the procedure of Mohseni et al. [7]. We determined sentence lengths and the number of specific POS tags in windows of 25 tokens for six POS tags (see Section 2.3). For each series we computed ApEn and ShEn values (Section 2.4). We then compared the text categories using their median values because a Kolmogorov-Smirnov test indicated that some features were not normally distributed. For our statistical comparison we used the non-parametric Mann-Whitney U test.
Tables 3 and 4 (left-hand side) compare the contemporary bestselling and non-bestselling texts in terms of ApEn and ShEn, respectively. The values of the features for earlier canonical and non-canonical texts are shown on the right-hand side. These data have been taken from Mohseni et al. [7]. To facilitate the comparison of values for each text category/feature combination, the (significantly) higher value of each pair is shown in boldface. For Noun, Verb, Adjective and Preposition, the category of bestseller has higher values than the nonbestselling texts in the contemporary corpus. In both categories, the values for sentence length are not significantly different from each other. Only in one major POS category, i.e., Pronoun, are the values for ApEn and ShEn higher for contemporary non-bestselling texts than for the bestsellers. Table 3. Median values of Approximate Entropy (ApEn) for all text properties and for all fictional text categories. ApEn values for contemporary bestselling (N = 94) vs. non-bestselling (N = 110) texts, and for canonical (N = 76) vs. non-canonical (N = 130) texts. The asterisks indicate whether the differences between the two text categories in the earlier or contemporary periods are statistically significant (Mann-Whitney U test; ns, not significant; *, p ≤ 0.05; **, p ≤ 0.01; and ***, p ≤ 0.001). Values that are significantly higher within a pair of columns are shown in boldface. The 95% confidence intervals for the median (according to [43]) are shown in parentheses. The data for earlier texts are from the study by Mohseni et al. [7]. If we compare earlier and contemporary texts in the fictional categories, we observe both differences and similarities. In earlier texts the values for all POS tags are higher for canonical texts than for non-canonical texts. Contemporary texts do not show any difference for the category of Adverb. For Pronoun, the value is higher for the non-bestselling texts. In summary, we observe a similar pattern for prepositions and the three POS tags representing major classes of content words, i.e., Noun, Verb and Adjective. Thus, the biggest difference in the comparison of preferred vs. non-preferred texts in the earlier and contemporary periods lies in the distribution of pronouns. Notably, ApEn and ShEn exhibit similar patterns of differences for all comparisons.

Contemporary
Examples of texts with a high degree of unpredictability in the JEFP corpus are Ulysses by James Joyce, The Golden Bowl by Henry James and Sartor Resartus by Thomas Carlyle, showing the highest ApEn values in the category of earlier canonical texts for Noun, Verb and Adjective, respectively. In the bestsellers category among the contemporary texts, Port Mortuary by Patricia Cornwell has the highest ApEn value for Noun and the highest ShEn value for Verb. Another prominent example is Freedom by Jonathan Franzen, which is the bestseller with the highest ApEn value for Adjective in the corpus. Table 4. Median values of Shannon Entropy (ShEn) for all text properties and for all fictional text categories. ShEn values for contemporary bestselling (N = 94) vs. non-bestselling (N = 110) texts, and for canonical (N = 76) vs. non-canonical (N = 130) texts. The asterisks indicate whether the differences between the two text categories in the earlier or contemporary periods are statistically significant (Mann-Whitney U test; ns, not significant; *, p ≤ 0.05; **, p ≤ 0.01; and ***, p ≤ 0.001). Values that are significantly higher within a pair of columns are shown in boldface. The 95% confidence intervals for the median (according to [43]) are shown in parentheses. The data for earlier texts are from the study by Mohseni et al. [7].  Tables S2 and S3 show the results for fictional and non-fictional texts for ApEn and ShEn, respectively. We refer the interested reader to these two supplementary tables, to gain an impression of the comparison between fictional and non-fictional texts. Summarizing the results, there is no uniform pattern in the degree of (un)predictability in fictional or non-fictional texts. For some text properties, such as Verb and Adjective, the values of ApEn and ShEn are higher for fictional than non-fictional texts, while for other text properties, such as Adverb and Pronoun, the opposite pattern can be observed. Moreover, the values of ApEn and ShEn do not correspond to each other in measuring the degree of (un)predictability in the fictional or non-fictional text categories. Figure S3 in the Supplementary Materials shows a correlation plot for ApEn and ShEn values for all earlier and contemporary text categories, for all text properties. For some text properties, such as Adjective and Adverb, the correlation coefficients are very high, while for others, such as Noun and Verb, they are lower. This finding is related to the difference between the discrimination power of ApEn and ShEn, which becomes visible when we look at the classification results in the next section.

Classification
We extend our analysis of preferred vs. non-preferred texts with a classification tasks. Classification determines the performance of each property/feature in distinguishing the text categories under analysis. For each setting we trained a support vector machine (SVM) with a radial basis function (RBF) kernel. To report the performance of the classification models, we used balanced accuracy, which eliminates the undesired effect of different class sizes in the input data. In the comparison of the classification results we rely on the 5 × 2CV paired t-test [44] with a significance level of α = 0.05. Table 5 shows the balanced accuracy scores for bestselling vs. non-bestselling contemporary texts, for each text property/feature combination. To compare contemporary and earlier texts, we also include the classification results of canonical vs. non-canonical earlier texts, which were published in Mohseni et al. [7] (right-hand side of Table 5). Table 5. Balanced accuracy of classification (%) for the single features for the bestselling/nonbestselling contemporary texts distinction and for the canonical/non-canonical early texts distinction. Values that are significantly higher within a pair of columns are shown in boldface. Wherever the results are not significantly better than random accuracy (50%), we mark the result with a dagger † . The data for earlier texts are from the study by Mohseni et al. [7]. In the task of classifying bestselling vs. non-bestselling contemporary texts, both ApEn and ShEn perform comparably well, except for Noun and Verb, where ApEn provides a significantly higher accuracy compared to ShEn. Comparing accuracy scores for the two time periods, we observe a shift in the performance of individual text properties, while ApEn of all text properties except Adverb distinguishes canonical from non-canonical earlier texts better than ShEn, the ApEn values of only two text properties in the contemporary texts, i.e., Noun and Verb, provide a better performance compared to ShEn. For other text properties, no significant difference was observed.

Bestselling vs. Non-Bestselling
The last row of Table 5 shows the performance of classification using all features. No significant difference between the discriminative power of ApEn and ShEn for the bestselling/non-bestselling contemporary texts distinction can be observed. Moreover, the results show that classification using the ApEn values of all text properties cannot distinguish the text categories under study better than ApEn of Noun alone. The difference between the two values is not statistically significant. Using ShEn of all text properties surpasses the performance of all individual ShEn features.
Concerning the results based on all features for earlier texts, ApEn outperforms ShEn with a high margin in the classification of canonical versus non-canonical texts. Taking all text properties into account, the difference between the performance of ApEn and ShEn in the separation of preferred and non-preferred contemporary texts disappears. Nevertheless, the classification accuracy for both features remains comparably high (79.4 and 77.6%, respectively), which confirms that (un)predictability analysis is a promising approach for analysing texts of different aesthetic categories.

Discussion and Conclusions
Confirming the results obtained by Mohseni et al. [7] for texts from the 19th and early 20th centuries, our study shows that the degree of preference associated with a contemporary text also has correlates in global statistical properties of the text. Generally speaking, preferred texts (bestsellers) are characterized by lower degrees of predictability for most features, as reflected in higher values for the two entropy measures, Shannon Entropy and Approximate Entropy (Tables 3 and 4).
However, we also found differences between contemporary and earlier texts. The earlier texts were better distinguished by Approximate Entropy than by Shannon Entropy (Table 5) [7]. This shows that the two text categories not only differ in terms of the unpredictability of the part-of-speech rates across windows of text (Shannon Entropy); the part-of-speech rates are also less predictable along the sequential organization of a text (Approximate Entropy). After reading a window of 25 words, a reader has less informa-tion about the part-of-speech distribution in the next window of 25 words, in preferred (canonical) texts compared to non-preferred (non-canonical) texts. This is different for the contemporary texts. Approximate Entropy does not globally provide better classification results than Shannon Entropy for this part of the corpus. Only two part-of-speech categories-Noun and Verb-exhibit higher classification accuracy values on the basis of Approximate Entropy than they do based on Shannon Entropy. When all parts of speech as well as sentence length are taken into consideration, there is no significant difference between the classification results (see Table 5). This shows that bestsellers generally exhibit a higher degree of irregularity in the distribution of the linguistic features used for this study than non-bestsellers. The degree of irregularity is not modulated locally, however, and does not depend on the sequential arrangement of structural features.
A second difference between the two time periods is that in the earlier works from the 19th and early 20th centuries, all part-of-speech tags were distributed more unpredictably in the canonical texts than in the non-canonical ones (Tables 3 and 4). For canonical texts, a low degree of predictability seems to be a general design principle. For contemporary texts, one part of speech, Pronoun, had higher entropy values for the non-bestselling texts compared to the bestselling texts. Moreover, there was no significant difference in the distribution of Adverbs. It seems that only the major classes of content words, Nouns, Verbs and Adjectives as well as Prepositions, whose occurrence correlates with that of nouns, are distributed more unpredictably in bestselling texts as opposed to non-bestselling contemporary texts.
There are at least four possible explanations for the observed differences. The first explanation is based on changes in writing styles. It is well known that narrative styles have changed considerably since the 17th century [25]. This concerns, among other things, the narrator's visibility and reliability, and the relationship between the narrator and the reader. Moreover, the inventory of registers used in novels has been broadened. For example, the technique of interior monologuing was introduced in modernism [45]. The high degree of unpredictability of POS tags in modern bestsellers, in comparison to non-bestsellers, points to a higher degree of heterogeneity of discourse modes in the former group of texts (Tables 3 and 4). However, then, the fact that Approximate Entropy does not separate the classes better than Shannon Entropy for all POS tags does seem to show that the sequential arrangement of discourse modes is no less predictable in bestsellers (Table 5). Simplifying this hypothesis, we speculate that bestselling authors draw on a more varied inventory of discourse modes than other authors, but the texts do not exhibit a higher degree of unpredictability as far as the sequential arrangement of these modes is concerned. This hypothesis would require closer inspection of the data, and additional methods that allow us to trace the trajectory of discourse modes across a text.
Related to this first explanation is a second one, which concerns the question of register and genre. Writing styles have not only changed 'locally' [25], but there are also shifts in the frequency of literary genres. Among the contemporary texts, specific genres seem to be particularly successful that are rare in the category of canonical texts (e.g., crime stories). As we have no reliable genre classification for our sample, we cannot test for the effect of genre directly. We did, however, conduct an experiment on another corpus, a large collection of fictional texts from several genres (see the Supplementary Materials, Section S3). The results show that distributions of Approximate Entropy and Shannon Entropy vary significantly between genres. However, there is no general pattern across textual properties: there is no genre that exhibits particularly high or low values for all part-of-speech frequencies and sentence length values, while the effect of genre and register as determinants of preference needs to be taken into account without doubt, the results of our preliminary study suggest that they may have a modulating, rather than a direct effect. Further studies are needed to test this assumption.
A third possible explanation for the observed differences between contemporary and earlier texts is provided by the factor of 'technology'. The process of writing has changed considerably between the earlier period-the 19th and early 20th centuries-and today, while the earlier texts were written either by hand or with a typewriter, contemporary writers can use computers. Texts can easily be edited, and re-edited, and the process of writing requires less planning than it used to. As a consequence, the difference between preferred and non-preferred texts may have decreased, as far as sequential organization is concerned, as the skills of a writer (as the architect of a story) may be less visible in contemporary texts. The general distributions of discourse modes, however, would not be affected by the process of writing, as they seem to be primarily a function of the author's creativity.
Finally, it is of course conceivable that the two types of preference that we consideredcanonization for the earlier texts, sales figures for the contemporary texts-are driven by different forces. The process of canonization is, to a large extent (though not exclusively), driven by academics. It is based on thorough analyses conducted by a community of researchers over an extended period of time. Bestselling books, in contrast, have not gone through this type of filter. For a text to succeed on the book market, it has to be advertised broadly and supported by the media, e.g., with reviews and public discussion. Even though literary critics play an important role in this process, they may have a comparatively small impact on the success of a book (sometimes, negative reviews increase the sales figures, as they lead to controversial public discussion).
From the perspective of empirical aesthetics, it seems conceivable that the design principles of canonical literature-variation both in global distribution and sequential organization-play a less important role in the commercial success of a (contemporary) work. While canonical literature typically targets 'educated readers', contemporary bestsellers have a broader target audience-in fact, they tend to target an audience as broad as possible. Aesthetic pleasure varies from reader to reader (see, for example, [46], and for poetry [47,48]). More experienced (or even professional) readers may take pleasure in reading less predictable texts than less experienced readers do.
Unfortunately, we cannot use the same type of operationalization for preference for contemporary and earlier texts, as sales figures (at the time of publication) are not available for the canonical texts (and today's sales figures are, again, influenced by canonization), and because contemporary texts are too young to be canonized. An alternative way of measuring preference for contemporary texts may be literary prizes. As mentioned in the Section 1, the Nobel Prize has been used as an indicator of preference [4,6]. A comparison between our data and Nobel prize winning books is another project that would broaden our understanding of structural reflexes of preference, and of preference itself.
The program of computational textual aesthetics has been heavily influenced by relevant studies from other domains. For example, statistical properties of (time) series have been analysed for music [31], poetry [49], and even bird song [50]. Measures such as autocorrelation, variability, surprise and predictability have also been used to predict musical preferences in humans [30,51]. As our own work has been influenced by the work on vision, we conclude with a remark on how our results relate to the visual domain. Here, basic perceptual features are also richer and more variable (or less predictable) in artworks than in many types of non-art images. Examples include the spatial distribution of luminance and colour edges across an image [52] and other basic visual features, such as edge orientation, spatial frequency tuning and colour-opponent spatial organization [12,53]. Whether a high degree of variation in such basic perceptual features is universal across aesthetic domains (visual art, literature, dance, music, etc.) is unclear at present.
In relevant studies, perceptual (structural) differences between traditional visual artworks and contemporary art have been observed. With the rise of modern art at the end of the 19th century, the pattern of image properties in visual artworks diversified [54,55]. In parallel, perceptual features that mediate the sensual beauty of artworks became less central for aesthetic judgements. Instead, image content and cultural context emerged as guides of what beholders prefer [56].
We speculate that there are parallels between aesthetic experience in the visual domain and in reading. In both domains, aesthetic preference seems to be related to the interplay between predictability and surprise. Our results are compatible with the hypothesis that the determinants of aesthetic experience in reading, like those in vision, are partly time-invariant, and partly culturally determined. A certain amount of variability and unpredictability, reflected in Approximate Entropy and Shannon Entropy in the present, study seems to be a good candidate for a time-invariant factor. However, in order to gain a better understanding of the determinants of preference in reading, several follow-up studies as sketched above will be needed.